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1. Introduction 



The purpose of this short note is to generalize for arbitrary /-divergences a result proved by Gel'fand 
and Yaglom [T] and Peres [2j for the Information Divergence and, more recently, by Dukkipati et al 
[3] for Tsallis' and Renyi's divergences. Our method focuses on the fundamental notion of convexity 
of the generating function / together with some standard integration results, thus stressing the 
fact that many properties of the Information Divergence can be extended to the general class of 
/-divergences (cf . [HE]). 

The rest of the note is organized as follows. In this introduction we set up the basic definitions 
and notation and state the main result, which is then proved in Section 2. 

Consider two probability measures P and R on a measurable space (X, A) and let p and r be 
their Radon-Nykodim derivatives with respect to a common dominating measure fx, which without 
loss of generality can be taken [i = P + Q. The differential version of the Information or Kullback- 
Leibler divergence is I(P\\R) = f x ln(p/r) pd\x. Gel'fand and Yaglom [1] and Perez [2] (see also [6j 
Theorem 2.4.2]) showed that 

I(P\\R)= sup jrP(E k ) In 

where the supremum is taken over all finite measurable partitions it = {Pi, . . . , E m } (m > 1) of 
X. In other words, by discretizing both P and R and computing the corresponding divergence one 
can get as close as wanted to I(P\\R). Recently Dukkipati et al. [3] proved a similar result for the 
Renyi's family of divergences 
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and hence also for the Tsallis' divergences 
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(a > 0). Their proof rely on measure theoretic considerations along with the inequality [P(E)] a < 
[J E (dP/dR) a dR] [RiE)]*- 1 , which follows from Holder's Inequality. 

Shortly, The /-divergence generated by / is Df(P,R) = J f(p/r)rd/j,, where / : [0, oo) — > R. 
is convex, /(l) = and, to avoid undefined expressions, /(0) = lim u | f{ u )i ' /(0/0) = and 
• /(a/0) = lim^o e f( a / e ) = a lim u ->oo f(u)/u. The class of /-divergences was introduced by 
Csiszar |8] and Ali and Silvey [9] and includes, besides the Information Divergence I(P\\R) = 
D u \ nu (P, R) and the family of Tsallis' divergences T a (P\\R) = ^[ u «-i]/( a -i)(-P, R), the varia- 
tional distance (/(n) = \u — 1|), the x 2 divergence (f(u) = (u — l) 2 ), the Hellinger discrimination 
(/(k) = {\fu — l) 2 ) and many other distances and discrepancy measures between probability mea- 
sures. While Renyi's divergences are not properly an /-divergence, they are functions of them (i.e. 
I a (P\\Q) = (a-l)- 1 ln[l + (Q-l)T Q (P|| J R)]). 

Our main result, of which the case of the Information and the Tsallis' divergences are special 
cases, is the following. 

Proposition 1. Let f and Df be as defined above. Then for any P and R 

W-yfw)/^), (1) 
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where the supremum is taken over all finite measurable partitions ir of X. Q 
Since the Renyi's divergences I a (P\\R) = (a — l) _1 ln[l + (a — l)T a (P\\R)] are a continuous 
monotone function of Tsallis' divergences, it follows from (pQ) that 

which is Dukkipati et al. [3] main result. 



2. Proof of Proposition 1 

We begin with some preliminary considerations. First, note that both sides of (pQ) remain the same 
if we substitute f(u) by f(u) = f(u) — a(u — 1). By taking y = a{u — 1) to be a support line to the 
graph of / at u = 1 we see that we can assume without loss of generality that f(u) is nonnegative, 
non increasing for u < 1 and nondecreasing for u > 1. Second, since P(A) = J A (p/r) r d/j,, if 
a < p(x)/r(x) < b on A, then also a < P{A) / R{A) < b. Finally, the left hand side of ([1]) is greater 
than or equal than the right hand side because, if ir = {Ej : j £ J} is a finite partition of X, 
Jensen's inequality implies that 

/ f(p/r)r d l x = Y, R(Ej) [ f(p/r) -f— dfi 



(2) 



We will now prove ([T|) in the case that Df(P,R) < oo. Due to the last consideration above, 
it will be enough to prove that that the left hand side of (pJJ is less than or equal than the right 
hand side or, equivalently, that given any e > there exists a partition ir such that the differ- 
ence between the leftmost and the rightmost sides of ([2]) is less than or equal than e. To do 
this, consider < H < K and define Ah = {x £ X : p(x) < Hr(x)}, Ck = {x £ X : 
p(x) > Kr(x)} and B h ,k = X - (A H U C K ) = {x G X : H r(x) < p(x) < Kr(x)}. Since 
Df(P,R) < oo, J A f(p/r)r dfjL must also be finite. Hence (i) lim^^o J A f(p/r)r dfj, = by 
dominated convergence and (ii) also Huih^q f[P(Aff)/ R{Ah)] R(Ah) = because Jensen's in- 
equality implies that < f[P{An) / R{Ah)]R{Ah) < J A f(p/r) r dfi. Therefore, for Hq small 
enough, J A ^ f(p/r)r d\i — f[P(Aff ) / R(Ah )\ R(Ah q ) < e/3. A similar argument shows that, for 
Kq large enough, J Ck f(p/r) rdfi — f[P{CK ) / R{Ck )] R(Ck ) < e/3. Next, since / is convex, it 
is continuous and hence absolutely continuous in [Hq, Kq\. Therefore, there exists a 5 > such that 
\f(u) — f(u')\ < e/3 whenever \u — u'\ < 5. With this in mind, partition the interval [Hq, Kq] in (say) 
m subintervals I±, . . . , I m , each having length less than 5, and define Ei = {x G X : p(x)/r(x) £ Ii}. 
Since for x £ Ei we have that p(x)/r(x) £ Ii and hence also P{Ei) / R{Ei) £ Ii, 

^/ £ /(^)^-/(§S) m) 

= J e [fipM - f } rd ^j E ^ r d » ^ ^ R ^ ■ 
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To finish this part of the proof, consider the partition ir = {Eq = Ah ,E\, . . . , E m , E„ l+ \ = Ck }- 
The previous considerations imply that the difference between the leftmost and the rightmost terms 
in ^ is less than or equal than e/3 + (e/3)R(Bjj 0: x ) + e/3 < e. 

Now suppose that Df(P,R) = oo. Then either J| p>r } fip/f) r dp, or J \ p<r \ f {p / V) r dfi should 
be infinite. Suppose first that Jt p>r \ f(p/r)rd[i = oo. We will show that there is a sequence of 
disjoint subsets D n such that ^2^ = i f[P(D n ) / R(D n )] R(D n ) = oo. This would imply, of course, 
that the sets D n can be used to construct a partition of X so that the rightmost term of ([2]) 
is as large as wanted, and this in turn that the right hand side of ([T]) is infinite. Indeed, let 
D n = {x £ X: p(x) > r(x) and (n — 1) < f\p(x)/r(x)] < n} and for n > 1 define b n = inf{n G 
K : u > 1 and /(it) > n}. Since / is continuous and (we are assuming wlog) nondecreasing 
for u > 1, it follows that D n = {x £ X : p(x) > r(x) and 6 n _i < p(x)/r(x) < b n }. Hence, 
bn-i < P{D n )/R{D n ) < b n , (n - 1) < f[P{D n )/R{D n )] < n and {/(p/r) - f[P(D n )/R(D n )]} < 1 
in D n . Therefore, 

This shows that if /| p>r } f(p/r)rdfi = oo, then so should be Yln°=l $ ( rId") ) R(Dn)- The case 
that /| p<r } f(p/ r ) r dn = oo is dealt with in a similar manner. rj 
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